External Table

The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. When dropping an EXTERNAL table, data in the table is NOT deleted from the file system. Usually when you create tables in hive, it moves them to a location - "/user/hive/warehouse". An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir. The following hive command creates a table with data.

Employee Table data

7369,SMITH,CLERK,7902,17-DEC-1980,800,100,20
7499,ALLEN,SALESMAN,7698,20-FEB-1981,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-1981,1250,500,30
7566,JONES,MANAGER,7839,2-APR-1981,2975,100,20
7654,MARTIN,SALESMAN,7698,28-SEP-1981,1250,1400,30
7698,BLAKE,MANAGER,7839,1-MAY-1981,2850,100,30
7782,CLARK,MANAGER,7839,9-JUN-1981,2450,100,10
7788,SCOTT,ANALYST,7566,09-DEC-1982,3000,100,20
7839,KING,PRESIDENT,4561,17-NOV-1981,5000,100,10
7844,TURNER,SALESMAN,7698,8-SEP-1981,1500,10,30
7876,ADAMS,CLERK,7788,12-JAN-1983,1100,100,20
7900,JAMES,CLERK,7698,3-DEC-1981,950,100,30
7902,FORD,ANALYST,7566,3-DEC-1981,3000,100,20
7934,MILLER,CLERK,7782,23-JAN-1982,1300,100,10

Create Employee Table

create external table emp
(
   empno int,
   ename varchar(20),
   job varchar(20),
   mgr int,
   hiredate string,
   sal float,
   comm float,
   deptno int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE

LOAD DATA INPATH '/home/cloudera/data/emp.txt' INTO TABLE emp;
To create external table, simply point to the location of data while creating the tables. This will ensure that the data is not moved into a location inside the warehouse directory.

create external table emp
(
   empno int,
   ename varchar(20),
   job varchar(20),
   mgr int,
   hiredate string,
   sal float,
   comm float,
   deptno int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE
LOCATION data/hive_Data/emp; 

Drop External Tables In Hive
ALTER TABLE some.table DROP PARTITION (part="some") PURGE;
hdfs dfs -rm -R /path/to/table/basedir

For an external table, If you are trying to drop a partition and as-well would like to delete the data. This can be achieved as below.

  • Alter external table as internal table -- by changing the TBL properties as external =false
  • Drop the partitions -- when you drop the partitions, data pertained to the partitions will also be dropped as now this table is managed table
  • Alter back the table as external=True.

By doing this, there more controlled on what we are deleting and drop the partitions rather than using hadoop rm command

How to enable subdirectory support in hive external tables?
set hive.input.dir.recursive=true;
set hive.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;
set hive.execution.engine=spark;
 

No comments:

Post a Comment